Machine learning crash course 중 datasets, generalization, and overfitting 챕터.
developers.google.com/machine-learning/crash-course/overfitting
Introduction
Data characteristics
Types of data
Quantity of data
Quality and reliability of data
Complete vs. incomplete examples
Labels
Direct versus proxy labels
Human-generated data
Imbalanced datasets
Downsampling and Upweighting
Rebalance ratios
Dividing the original dataset
Training, validation, and test sets
Additional problems with test sets
Transforming data
Generalization
Overfitting
Fitting, overfitting, and underfitting
Detecting overfitting
What causes overfitting?
Generalization conditions
Model complexity
Regularization
What is complexity?
L2 regularization
Regularization rate (lambda)
Early stopping: an alternative to complexity-based regularization
Finding equilibrium between learning rate and regularization rate
Interpreting loss curves
What’s next?